#### Part 5

#### CHAPTER 3

# Architecture and Organization



Alan Clements

1

These slides are provided with permission from the copyright for CS2208 use only. The slides must not be reproduced or provided to anyone outside the class.

All downloaded copies of the slides are for personal use only.

Students must destroy these copies within 30 days after receiving the course's final assessment.



#### Pseudo instructions

- □ A *pseudo instruction* is an operation that the programmer can use when writing code.
  - o The actual instruction <u>does not</u> have a <u>direct</u> machine language equivalent.
    - For example, you <u>can't</u> write MOV r0,#0x12345678 to load register r0 with the 32-bit value 0x12345678 because the instruction is only 32 bits long in total.
    - Instead, you can use LDR  $\mathbf{r0}$ , =  $0 \times 12345678$  pseudo instruction, Yes, it is = not # It is NOT MOV  $\mathbf{r0}$ , =  $0 \times 12345678$  =
      - the assembler will generate suitable code to carry out the same action.
        - store the constant 12345678<sub>16</sub> in a so-called literal pool or constant pool somewhere in memory after the current AREA
        - o *generates suitable code* to load the stored constant 12345678<sub>16</sub> to r0

#### Pseudo instructions

- □ Another *pseudo instruction* is ADR **r0**, label, which loads the <u>32-bit address</u> of the line 'label' into register r0, using the appropriate code generated by the assembler.
- ☐ The following fragment demonstrates the use of the ADR pseudo instruction.

This LDR instruction here is **NOT** a pseudo instruction

ADR r1, MyArray; set up r1 to point to MyArray

; loads register r1 with the 32-bit address of MyArray

LDR **r3**,[r1]

;read an element using the pointer

MyArray DCD 0x12345678 ;the address of this data will be loaded to r1

- ☐ The programmer does not have to know how the assembler generates suitable code to implement such *pseudo instructions*But as a student, you need to know it!!
- ☐ All this is done automatically.
- ☐ This can be realized by utilizing the *program counter relative addressing*

#### **Program Counter Relative Addressing**

- □ Register *indirect relative addressing allows* us to
  - o specify the location of an operand with respect to a register value.
- ☐ LDR **r0**,[r1]
- specifies that the operand address is in r1
- □ LDR **r0**, [r1, #16] specifies that the operand is 16 bytes onward from r1.





 $\square$  Suppose that we use r15, i.e., the PC, to generate an address by writing

LDR **r0**, [PC, #16].

- o The operand is 16 bytes onward from the PC
- $\circ$  i.e., 8 + 16 = 24 bytes from the current instruction.
  - The ARM's PC <u>in most of the cases</u> is 8 bytes from the current instruction to be executed, due to <u>pipelining</u> (automatically fetches the next instruction before the current one has been executed).
- Program Counter

  Program Counter

  Memory address register

  Main store (memory)

  Data path between memory and registers any source and destination operands required by the instruction

  Op-code

  Operands

  Memory buffer register

  Register File

  Register r0

  Register r1

☐ If the program and its data are relocated elsewhere in memory, the *relative offset* does not change.



#### **ARM's Data-Processing Instructions**

Addition ADD, ADC,

ADDS, ADCS

Subtraction SUB, RSB,

SUBS, RSBS

There are 4 groups of ARM's instructions

- ✓ *Data-Processing* instructions
- ✓ Branching instructions
- ✓ Loading/Storing a single-register instructions
- ✓ Loading/Storing multi-registers instructions

Negation

**NEGS** 

NEG,

Move MOV, MVN,

MOVS, MVNS

Multiplication MUL, MLA,

MULS, MLAS

Bitwise logic AND, ORR, EOR, BIC,

ANDS, ORRS, EORS, BICS

Comparison CMP, CMN, TEQ, TST

Shift LSL, LSR, ASR, ROR, RRX, LSLS, LSRS, ASRS, RORS, RRXS

To learn more about any ARM assembly instruction, you can just Google the words

ARM Keil + the operation-code.

For example, to learn more about "ADD" instruction, you need to Google("ARM Keil add").

It is usually the first link.

55

- ☐ A simple ADD (and ADDS) instruction adds two 32-bit values.
- □ ARM also has an ADC (add with carry), as well as ADCS, that adds two 32-bit values together with the carry bit.
  - o This allows extended precision arithmetic as Figure 3.21 demonstrates.

FIGURE 3.21 Single- and extended-precision addition



(a) Single-precision addition. When r0 is added to r1, the result is loaded into r2, and the carry bit is loaded into the carry flag.

(b) Double-precision extended addition. When r0 is added to r2, any carry out is stored in the the carry bit. When r1 is added to r3, the carry bit is added to their sum. In other words, the carry out generated by ADDS r4, r0, r2 becomes the carry in used by ADC r5, r1, r3.

56

ADDS and ADCS update ALL flags according to the result

⊌ d R. El-Sakka

☐ Beside the *normal subtraction* (SUB), ARM also provides *reverse subtraction* (RSB)

```
o SUB r1, r2, r3 ; [r1] \leftarrow [r2] - [r3]
o RSB r1, r2, r3 ; [r1] \leftarrow [r3] - [r2]
```

□ RSB is useful, as ARM treats its operands differently.

```
o For example, to perform [r1] ← 10 -[r2], you can not use SUB r1, #10, r2; THIS IS WRONG instead, you can use RSB r1, r2, #10; CORRECT
```

□ Note that

ADD **r1**, #5 means ADD **r1**, r1, #5 When having a 3-operands instruction, if the 1<sup>st</sup> and 2<sup>nd</sup> operands are the same registers, it is allowed to short-hand the instruction by typing the register once. The assembler will take care of this short-hand and repeat the operand.

☐ Negation is to subtract a number from 0 (arithmetic complement, i.e., 2's complement)

he end effect as if multiplying the operand by -1

- ARM does *not* have a negation instruction as such
- Instead, ARM provides a pseudo instruction called NEG

NEG r1, r2. • • NEG instruction has only two operands.

- o The RSB instruction is utilized to implement NEG
  - To negate r2 (i.e., calculating 0 [r2]) and store the result in r1, NEG **r1**, r2

or

RSB **r1**, r2, #0

To negate r2 (i.e., calculating 0 - [r2]) and store the result in r2,

NEG **r2**, r2.

or

Can not be shortened to NEG r2

RSB **r2**, r2, #0

Or simple

RSB **r2**, #0

#### ARM's Data-Processing Instructions (Arithmetic Instructions: Move and Move NOT)

- □ ARM provides a MOV instruction that copies the value of the second operand into the first operand
  - o To copy the content of r1 to r0,

MOV **r0**, r1 · • •

MOV instruction has only two operands.

- ARM also provides MVN (move not) that performs a <u>bitwise</u> logical complement operation (i.e., logical NOT) on the value of the second operand (i.e., flipping each zero to one and each one to zero), and places the result into the first operand
  - o To copy the *logical complement* of the content of r1 to r0,

MVN **r0**, r1 · · •

MVN instruction has only two operands.

**MUL can not** be shortened to two operands

- ☐ The multiply instruction, MUL Rd, Rm, Rs
  - o Takes two 32-bit signed integer values from registers Rm and Rs
  - o Forms their 64-bit product
  - o Stores in 32-bit register Rd the lower-order 32 bits of the 64-bit product.

```
MOV r0, #121 ; load r0 with 121

MOV r1, #96 ; load r1 with 96

MUL r2, r0, r1 ; r2 = r0 x r1
```

- □ A 32-bit by 32-bit multiplication is *truncated* to the lower-order 32 bits.
- ☐ In MUL instruction, *same register* <u>can't</u> be used to specify both the <u>destination</u> Rd and the <u>operand</u> Rm,
  - o because ARM's implementation uses Rd as a temporary register during multiplication. This is a feature of the ARM processor.
- □ ARM *does not* allow multiply by a constant

- □ ARM has a *multiply and accumulate* instruction, MLA, that
  - o performs a multiplication and adds the product to a running total.
- ☐ MLA instruction has a four-operand form:

```
MLA Rd, Rm, Rs, Rn ; [Rd] = [Rm] \times [Rs] + [Rn].
```

#### MLA can not be shortened to three operands

- ☐ As in the normal MUL instruction,
  - o A 32-bit by 32-bit multiplication is *truncated* to the lower-order 32 bits.
  - o  $same\ register\ \underline{can't}$  be used to specify both the  $destination\ \mathbf{Rd}$  and the  $operand\ \mathsf{Rm}$
- ☐ ARM *does not* allow multiply by a constant

- □ ARM's *multiply and accumulate* supports the calculation of an *inner product* (a.k.a. *dot product*).
- ☐ The inner product of two vectors  $\mathbf{a} = [a_1, a_2, ..., a_n]$  and  $\mathbf{b} = [b_1, b_2, ..., b_n]$  is defined as

$$s = \mathbf{a} \cdot \mathbf{b} = \sum_{1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \dots + a_n b_n$$

☐ The following program shows how the multiply and accumulate instruction is used to form the inner product between n-component vectors, Vector1 and Vector2

```
AREA MultiplyAndAccumulateExample, CODE, READONLY
        ENTRY
        EQU
                          ;4 components in this example
n
        MOV r4, #n ; r4 is the loop counter
        MOV r3, #0 ; clear the inner product
        ADR r5, Vector1 ; r5 points to vector 1
        ADR r6, Vector2; r6 points to vector 2
       LDR r0, [r5], #4
Loop
                        ; REPEAT
                            read a component of A
                            and update the pointer
        LDR \mathbf{r1}, [r6], #4; get the second element
                            and update the pointer
        MLA r3, r0, r1, r3
                         ; add new product term to the total
                            (r3 = r3 + r0 \cdot r1)
        SUBS r4, r4, #1
                         ; decrement the loop counter
                             (and remember to set the CCR)
                         ;UNTIL all done
        BNE
            Loop
Vector1 DCD 1,2,3,4
```

Vector2 DCD 2,3,4,5

64

- ☐ In addition to the 32-bit MUL and MLA, ARM includes several forms of multiplication instruction, including
  - UMLL Unsigned long multiply
     (Rm × Rd yields 64-bit product in two registers)
  - o UMLAL Unsigned long multiply-accumulate
  - o SMULL Signed long multiply
  - o SMLAL Signed long multiply-accumulate



□ ARM *does not implement a division* operation

☐ If needed, the programmer must write a suitable division routine to implement division

# ARM's Data-Processing Instructions (Bitwise Logical Operations)

The rest of the bits in these instructions are zeros.

□ Logical operations are known as *bitwise operations* because they are applied to the individual bits of a register

```
It is ORR, not OR.
```

```
AND r2, r1, r0 \rightarrow Example: 11001010 . 00001111 \rightarrow 00001010
```

$$\longrightarrow$$
 ORR **r2**, r1, r0  $\rightarrow$  Example:  $11001010 + 00001111 \rightarrow 110011111 -$ 

EOR **r2**, r1, r0 
$$\rightarrow$$
 Example:  $11001010 \oplus 00001111 \rightarrow 11000101$ 

MVN 
$$\mathbf{r2}$$
, r0  $\rightarrow$  Example:  $11001010 \rightarrow$ 

□ The MVN operation can also be performed by using an EOR with the second operand equal to  $FFFFFFFF_{16}$  (i.e., 32 1's <u>in a register</u>)

```
o the value of x \oplus (11...1111)_2 is = \text{NOT } x.
```

```
MOV r1, #0xFFFFFFFF
EOR r2, r1, r0 ; Same as MVN r2, r0
```

67

68

# ARM's Data-Processing Instructions (Bitwise Logical Operations)

- □ Example 1: suppose that
  - o register r0 contains the 8 bits bbbbbbxx,
  - o register r1 contains the 8 bits bbbyyybb and
  - o register r2 contains the 8 bits zzzbbbbb, where
  - o xx, yyy, and zzz represent the bits of desired fields and
  - o the b's are unwanted bits.
- $\square$  We wish to pack these bits to get the final value zzzyyyxx stored in r0.
- ☐ We can achieve this by:

ORR

OR

```
AND r0,r0,#2_11 ; Mask r0 to two bits xx

AND r1,r1,#2_11100 ; Mask r1 to three bits yyy

AND r2,r2,#2_11100000 ; Mask r2 to three bits zzz

ORR r0,r0,r1 ; Merge r1 and r0 to get 000yyyxx

ORR r0,r0,r2 ; Merge r2 and r0 to get zzzyyyxx
```

#### The Keil assembler uses a prefix

- 2\_ to indicate binary
- o 8\_ to indicate octal
- o Ox or & to indicate hexadecimal
- o no prefix to indicate decimal

# ARM's Data-Processing Instructions (Bitwise Logical Operations)

- □ Example 2: suppose we have an 8-bit value abcdefgh and
- □ we wish to
  - o clear bits **b** and **d**,
  - o set bits a, e, and f, and
  - o toggle (invert) bit h,
  - i.e., generate the following output  $10c011g\bar{h}$
- ☐ We can achieve this by:

```
AND \mathbf{r0}, \mathbf{r0}, \mathbf{\#2}_10101111 ;Clear bits \mathbf{b} and \mathbf{d} to get a0c0efgh ORR \mathbf{r0}, \mathbf{r0}, \mathbf{\#2}_10001100 ;Set bits \mathbf{a}, \mathbf{e}, and \mathbf{f} to get 10c011gh EOR \mathbf{r2}, \mathbf{r2}, \mathbf{\#2} 1 ;Toggle bit \mathbf{h} to get the result
```

# ARM's Data-Processing Instructions (Bitwise Logical Operations)

- □ *ARM* provides a *bit clear* instruction, **BIC**, that
  - ANDs its first operand with the *complement* of its second operand.
- **Example:** suppose we have r1 = 10101010 and r2 = 000011111.
  - The instruction BIC **r0**, r1, r2 yield 10100000

- ☐ In ARM, updating the condition flags can be *implicit* or *explicit*.
- □ Both implicit and explicit flags' updates modify the contents of the condition code register (CCR), a.k.a. current program status register (CPSR), which is later can be tested to determine whether execution continues in sequence, or a branch is taken
  - o Example of *implicit* updates for the condition flags SUBS **r1**, r1, r2
  - Example of explicit updates for the condition flags
     CMP r1, r2

This instruction will evaluate r1 - r2 without storing the result, and set the condition code register

```
CMP r1,r2 ;is r1 = r2?

BEQ DoThis ;if equal then goto DoThis

ADD r1,r1,#1 ;else add 1 to r1

B Next ;jump past the then part

.

DoThis SUB r1,r1,#1 ;subtract 1 from r1

Next ... ;both forks end up here
```

- □ ARM has *four* instructions in its *test-and-compare* group which *explicitly* update the condition code flags (i.e., no need to append an S to any of them)
  - o **CMP** (compare instruction)
    - Subtracts the second operand from the first and update all flags
  - o **TEQ** (test equivalent instruction)
    - Determines whether two operands are equivalent or not (similar to EORS, except that the result is discarded)
  - o **TST** (test instruction)

BNE.

not

BEQ.

- Compares two operands by **ANDing** them together and *update flags*
- Usually used to test individual bits;
- TST r0, #2\_00100000 ;AND r0 with 00100000 to test bit 5 ■● BNE LowerCase ;If bit 5 is 1, jump to lowercase
  - **CMN** (compare negative instruction).
  - 2's complements the second operand before performing the comparison

```
CMN r1, r2 ; evaluates [r1] - (-[r2])
; i.e., evaluate [r1] + [r2]
```

It is **CMN**, *not* **CPN**. Correct the book, page 178

**72** 

# ARM's Data-Processing Instructions (Shift Operations)

- □ Shift operations move bits one <u>or more</u> places <u>left</u> or <u>right</u>.
  - o Logical shifts
    - *insert a 0* in the vacated position.
  - o Arithmetic shifts
    - replicate the sign-bit during a right shift
  - o Circular shifts
    - *the bit shifted-out of one end is shifted-in the other end* i.e., the register is treated as a ring
  - o Circular shifts through carry
    - included the carry bit in the shift path







In a rotate operation, the bit shifted out is copied into the bit vacated at the other end (i.e., no bit is lost during a rotate) The bit shifted out is also copied into the carry bit.



#### ARM's Data-Processing Instructions (Shift Operations)

Examples of logical shifts on a 16-bit value

| Source string                                                 | Direction | Number of shifts | Destination string                                            |
|---------------------------------------------------------------|-----------|------------------|---------------------------------------------------------------|
| <pre>0110011111010111 0110011111010111 0110011111010111</pre> | Left      | 1                | 1100111110101110                                              |
|                                                               | Left      | 2                | 10011111010111100                                             |
|                                                               | Left      | 3                | 00111110101111000                                             |
| 0110011111010111                                              | Right     | 1                | <pre>0011001111101011 0001100111110101 0000110011111010</pre> |
| 0110011111010111                                              | Right     | 2                |                                                               |
| 0110011111010111                                              | Right     | 3                |                                                               |

These examples are hypothetical, as ARM registers are 32 bits, not 16 bits.

#### ARM's Data-Processing Instructions (Shift Operations)

- ☐ The *rotate through carry* instruction (sometimes called *extended shift*) included the carry bit in the shift path.
  - o The carry bit is shifted into the bit of the word vacated, and
  - the bit of the word shifted out is shifted into the carry.
- ☐ If the carry = 1 and the eight-bit word to be shifted is 01101110, a rotate left through carry would give 11011101 and

Register



Carry

- □ **ARM** has no explicit shift operations!!.
- □ ARM combines shifting with other data processing operations, where
  - o the <u>second operand</u> in the arithmetic operation (i.e., the <u>LAST parameter in</u> <u>the assembly arithmetic instruction</u>) is allowed to be shifted <u>before</u> it is used.
  - o For example,

```
ADD \mathbf{r0}, \mathbf{r1}, \mathbf{r2}, LSL #1 ; [\mathbf{r0}] \leftarrow [\mathbf{r1}] + [\mathbf{r2}] \times 2
```

- logically shift left the contents of r2,
- add the result to the contents of r1, and
- put the results in r0
- □ ARM also combines shifting with MOV and MVN instructions
  - o For example, MOV r3, r3, LSL #1 ;  $[r3] \leftarrow [r3] \times 2$
  - **ARM** provides **pseudo** shift instructions, which are translated to **MOV** instructions.
  - o For example,
     LSL r3, r3, #1 ; will be converted to:
     MOV r3, r3, LSL #1
     or simply
     LSL r3, #1

- □ ARM support both *static* and *dynamic* shifts (except *rotate through carry* instruction which allows *only one single shift* per instruction)
  - o In *static shift*, the number of shift places
    - is determined *when the code is written*
    - can only have the following values, inclusive:
      - LSL: allowable values are from #0 to #31 (32 different values)
      - LSR: allowable values are from #1 to #32 (32 different values)
      - ASR: allowable values are from #1 to #32 (32 different values)
      - ROR: allowable values are from #1 to #31 (31 different values)

        The remaining value is used to encode RRX
  - o In *dynamic shift*, the number of shift places
    - is determined when the code is executed, i.e., at run time
- ☐ You can perform *dynamic shifts* as follow

MOV 
$$\mathbf{r4}$$
, r3, LSL r2 ;  $[\mathbf{r4}] \leftarrow [\mathbf{r3}] \times 2^{\mathbf{r}2}$  or LSL  $\mathbf{r4}$ , r3, r2 ;  $[\mathbf{r4}] \leftarrow [\mathbf{r3}] \times 2^{\mathbf{r}2}$ 

This instruction

- o shifts the contents of r3 left by the value in r2 and
- o puts the result in r4.
- If the value in r2 is  $\geq 32$ , zero will be stored in r4

□ ARM implements only the following five shifts

LSL logical shift left

LSR logical shift right

ASR arithmetic shift right

ROR rotate right

RRX rotate right through carry (one shift)

□ Other shift operations must be synthesized by the programmer.

- □ Other shift operations must be synthesized by the programmer.
  - o An arithmetic shift left is effectively the same as a logical shift left
  - For a 32-bit value,
     an *n*-bit rotate shift left is identical to a 32 n rotate shift right
  - o Rotate left through carry can be implemented by means of ADCS r0, r0, r0; add r0 to r0 with carry and set the flags
    - The instruction means r0 + r0 + C, i.e.,  $2 \times r0 + C$ , i.e.,
      - shifting left the content of r0
      - store the value of C in the vacant bit to the left, and
      - storing the shifted-out bit in the carry flag